Search Result

Select

Design of FPGA accelerator with high parallelism for convolution neural network

WANG Xiaofeng, JIANG Penglong, ZHOU Hui, ZHAO Xiongbo

Journal of Computer Applications 2021, 41 (3): 812-819. DOI: 10.11772/j.issn.1001-9081.2020060996

Abstract （585）

PDF （1115KB）（782）

Save

Most of the algorithms based on Convolutional Neural Network (CNN) are computation-intensive and memory-intensive, so they are difficult to be applied in embedded fields such as aerospace, mobile robotics and smartphones which have low-power requirements. To solve this problem, a Field Programmable Gate Array (FPGA) accelerator with high parallelism for CNN was proposed. Firstly, four kinds of parallelism in CNN algorithm that can be used for FPGA acceleration were compared and studied. Then, a Multi-channel Convolutional Rotating-register Pipeline (MCRP) structure was proposed to concisely and effectively utilize the convolution kernel parallelism of CNN algorithm. Finally, using the strategy of input/output channel parallelism+convolution kernel parallelism, a CNN accelerator architecture with high parallelism was proposed based on MCRP structure, and to verify the design rationality of the architecture, it was deployed on the XCZU9EG chip of XILINX. Under the condition of making full use of the on-chip Digital Signal Processor (DSP) resources, the peak computing capacity of the proposed CNN accelerator reached 2 304 GOPS(Giga Operations Per Second). Taking SSD-300 algorithm as the test object, this CNN accelerator had the actual computing capacity of 1 830.33 GOPS, and the hardware utilization rate of 79.44%. Experimental results show that, the MCRP structure can effectively improve the computing capacity of CNN accelerator, and the CNN accelerator based on MCRP structure can generally meet the computing capacity requirements of most applications in the embedded fields.

Reference | Related Articles | Metrics

Select

Hyperspectral face recognition system based on VGGNet and multi-band recurrent network

XIE Zhihua, JIANG Peng, YU Xinhe, ZHANG Shuai

Journal of Computer Applications 2019, 39 (2): 388-391. DOI: 10.11772/j.issn.1001-9081.2018081788

Abstract （704）

PDF （635KB）（411）

Save

To improve the effectiveness of facial feature represented by hyperspectral face data, a VGGNet and multi-band recurrent training based method for hyperspectral face recognition was proposed. Firstly, a Multi-Task Convolutional Neural Network (MTCNN) was used to locate the hyperspectral face image accurately in preprocessing phase, and the hyperspectral face data was enhanced by mixed channel. Then, a Convolutional Neural Network (CNN) structure based VGG12 deep network was built for hyperspectral face recognition. Finally, multi-band recurrent training was introduced to train the VGG12 network and realize the recognition based on the characteristics of hyperspectral face data. The experimental results of UWA-HSFD and PolyU-HSFD databases reveal that the proposed method is superior to other deep networks such as DeepID, DeepFace and VGGNet.

Reference | Related Articles | Metrics

Select

Transmission and scheduling scheme based on W-learning algorithm in wireless networks

ZHU Jiang PENG Zhenzhen ZHANG Yuping

Journal of Computer Applications 2013, 33 (11): 3005-3009.

Abstract （494）

PDF （973KB）（355）

Save

To solve the problem of transmission in wireless networks, a transmission and scheduling scheme based on W-learning algorithm in wireless networks was proposed in this paper. Building the system model based on Markov Decision Progress (MDP), with the help of W-learning algorithm, the goal of using this scheme was to transmit intelligently, namely, the package loss under the premise of energy saving by choosing which one to transmit and the transmit mode legitimately was reduced. The curse of dimensionality was overcome by state aggregate method, and the number of actions was reduced by action set reduction scheme. The storage space compression ratio of successive approximation was 41%; the storage space compression ratio of W-learning algorithm was 43%. Finally, the simulation results were given to evaluate the performances of the scheme, which showed that the proposed scheme can transport data as much as possible on the basis of energy saving, the state aggregation method and the action set reduction scheme can simplify the calculation with little influence on the performance of algorithms.